Why
To quantify the impact from a change
How:
- Choose the metrics to evaluate. Also can select invariant metrics to sanity check
- Power analysis - define the minimum detectable effect, significance threshold (alpha), statistical power (1-beta) to find the appropriate sample size -> how long to run the experiment
- Create the control and treatment groups ensuring there are no cofounding factors that could also affect the metrics
- (Optional) Run an A/A test to ensure the control and treatments are properly selected and there are no interference (See section below of other ways of detecting interference)
- Run the experiment
- Analyze the results by calculating the significant effect size and make a decision that trades-off business cost with the observed effect size
Gotchas:â
- Feature timing could have a big impact
- Watch out for interference and unintended effects
- Watch out of user resisting change
- What out for multiple metrics because they can be significant by chance and you should correct for it
- Don't peak
- Use one OEC (overall evaluation criterion) to balance short term and long term goals
How do you detect interference?
At LinkedIn, they cluster the graph into 10,000 clusters. The graph comprises all active members as nodes and their connection as edges. Then split the clusters into two experiments
a. individual level experiment, where members are randomly sorted into treatment or control groups b. cluster based experiment, where the whole community is in treatment or control groups
The intuition is that if there is no network effect, then both of the these experiment should yield the same estimated effect
Then devised a statistical test to determine if the effect is significant or not.